The Chamomile Scheme: An Optimized Algorithm for N-body simulations on Programmable Graphics Processing Units

نویسندگان

  • Tsuyoshi Hamada
  • Toshiaki Iitaka
چکیده

We present an algorithm named “Chamomile Scheme”. The scheme is fully optimized for calculating gravitational interactions on the latest programmable Graphics Processing Unit (GPU), NVIDIA GeForce8800GTX, which has (a) small but fast shared memories (16 K Bytes × 16) with no broadcasting mechanism and (b) floating point arithmetic hardware of 500 Gflop/s but only for single precision. Based on this scheme, we have developed a library for gravitational N -body simulations, “CUNBODY-1”, whose measured performance reaches to 173 Gflop/s for 2048 particles and 256 Gflop/s for 131072 particles.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

IfI - 06 - 10 Clausthal - Zellerfeld 2006

In this paper, we present a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p stream processor units, this approach achieves the optimal time complexity O((n log n)/p). While this makes our approach competitive with common sequential sorting algorithms not only from a theoretical viewpoint, it is als...

متن کامل

IfI - 06 - 11 Clausthal - Zellerfeld 2006

In this paper, we present a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p stream processor units, this approach achieves the optimal time complexity O((n log n)/p). While this makes our approach competitive with common sequential sorting algorithms not only from a theoretical viewpoint, it is als...

متن کامل

Large-scale ferrofluid simulations on graphics processing units

We present an approach to molecular-dynamics simulations of ferrofluids on graphics processing units (GPUs). Our numerical scheme is based on a GPU-oriented modification of the Barnes–Hut (BH) algorithm designed to increase the parallelism of computations. For an ensemble consisting of a million ferromagnetic particles, the performance of the proposed algorithm on a Tesla M2050 GPU demonstrated...

متن کامل

Accelerating Euler Equations Numerical Solver on Graphics Processing Units

Finite volume numerical methods have been widely studied, implemented and parallelized on multiprocessor systems or on clusters. Modern graphics processing units (GPU) provide architectures and new programing models that enable to harness their large processing power and to design computational fluid dynamics simulations at both high performance and low cost. We report on solving the 2D compres...

متن کامل

Real-Time Freedom Deformation Using Programmable Hardware

The recent introduction of programmable Graphics Processing Units (GPUs) has had a tremendous impact on real-time graphics. With their extended flexibility, GPUs can support tasks that go way beyond their originally intended rendering functionality. One domain where GPUs can be applicable is in accelerating geometric modeling applications. In this work, we present a real-time, GPU-based, evalua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008